Skip to content

Conversation

@mhauru
Copy link
Member

@mhauru mhauru commented Dec 18, 2025

I put together a quick sketch of what it would look like to use VarNamedTuple as a VarInfo directly. By that I mean having a VarInfo type that is nothing but accumulators plus a VarNamedTuple that maps each VarName to a tuple (or actually a tiny struct, but anyway) of three values: Stored value for this variable, whether it's linked, and what transform should be applied to convert the stored value back to "model space". I'm calling this new VarInfo type VNTVarInfo (name to be changed later).

This isn't finished yet, but the majority of tests pass. There are a lot of failures around edge cases like Cholesky and weird VarNames and such, but for most simple models you can do

vi = VNTVarInfo(model)
vi = link!!(vi, model)
evaluate!!(model, vi)

and it'll give you the correct result. unflatten and vi[:] also work.

I'll keep working on this, but at this point I wanted to pause to do some benchmarks, see how viable this is. Benchmark code, very similar to #1182, running evaluate!! on our benchmarking models:

Details
module VIBench

using DynamicPPL, Distributions, Chairmarks
using StableRNGs: StableRNG
include("benchmarks/src/Models.jl")
using .Models: Models

function run()
    rng = StableRNG(23)

    smorgasbord_instance = Models.smorgasbord(randn(rng, 100), randn(rng, 100))

    loop_univariate1k, multivariate1k = begin
        data_1k = randn(rng, 1_000)
        loop = Models.loop_univariate(length(data_1k)) | (; o=data_1k)
        multi = Models.multivariate(length(data_1k)) | (; o=data_1k)
        loop, multi
    end

    loop_univariate10k, multivariate10k = begin
        data_10k = randn(rng, 10_000)
        loop = Models.loop_univariate(length(data_10k)) | (; o=data_10k)
        multi = Models.multivariate(length(data_10k)) | (; o=data_10k)
        loop, multi
    end

    # lda_instance = begin
    #     w = [1, 2, 3, 2, 1, 1]
    #     d = [1, 1, 1, 2, 2, 2]
    #     Models.lda(2, d, w)
    # end

    models = [
        ("simple_assume_observe", Models.simple_assume_observe(randn(rng))),
        ("smorgasbord", smorgasbord_instance),
        ("loop_univariate1k", loop_univariate1k),
        ("multivariate1k", multivariate1k),
        ("loop_univariate10k", loop_univariate10k),
        ("multivariate10k", multivariate10k),
        ("dynamic", Models.dynamic()),
        ("parent", Models.parent(randn(rng))),
        # ("lda", lda_instance),
    ]

    function print_diff(r, ref)
        diff = r.time - ref.time
        units = if diff < 1e-6
            "ns"
        elseif diff < 1e-3
            "µs"
        else
            "ms"
        end
        diff = if units == "ns"
            round(diff / 1e-9; digits=1)
        elseif units == "µs"
            round(diff / 1e-6; digits=1)
        else
            round(diff / 1e-3; digits=1)
        end
        sign = diff < 0 ? "" : "+"
        return println(" ($(sign)$(diff) $units)")
    end

    new = isdefined(DynamicPPL, :(VNTVarInfo))
    prefix = new ? "New" : "Old"

    for (name, m) in models
        println()
        println(name)
        vi = VarInfo(StableRNG(23), m)
        vi_linked = link!!(deepcopy(vi), m)
        # logp = getlogjoint(last(DynamicPPL.evaluate!!(m, vi)))
        # logp_linked = getlogjoint(last(DynamicPPL.evaluate!!(m, vi_linked)))
        # @show logp
        # @show logp_linked
        res = @b DynamicPPL.evaluate!!($m, $vi)
        print("$prefix unlinked: ")
        display(res)
        res = @b DynamicPPL.evaluate!!($m, $vi_linked)
        print("$prefix linked:   ")
        display(res)

        if !isdefined(DynamicPPL, :(VNTVarInfo))
            svi_nt = SimpleVarInfo(vi, NamedTuple)
            try
                res = @b DynamicPPL.evaluate!!($m, $svi_nt)
            catch e
                res = missing
            end
            print("SVI NT:       ")
            display(res)
            svi_od = SimpleVarInfo(vi, OrderedDict)
            res = @b DynamicPPL.evaluate!!($m, $svi_od)
            print("SVI OD:       ")
            display(res)
        end
    end
end

run()

end

Results contrasting the new VarInfo with both the old VarInfo and with SimpleVarInfo{NamedTuple} and SimpleVarInfo{OrderedDict}. Some SVI NT results are missing because it couldn't handle the IndexLenses:

simple_assume_observe
New unlinked: 2.778 ns
New linked:   12.201 ns
Old unlinked: 91.414 ns (4 allocs: 128 bytes)
Old linked:   80.752 ns (4 allocs: 128 bytes)
SVI NT:       2.468 ns
SVI OD:       4.941 ns

smorgasbord
New unlinked: 5.375 μs (12 allocs: 6.156 KiB)
New linked:   6.146 μs (18 allocs: 8.750 KiB)
Old unlinked: 16.375 μs (420 allocs: 33.375 KiB)
Old linked:   13.354 μs (325 allocs: 18.609 KiB)
SVI NT:       missing
SVI OD:       357.333 μs (3514 allocs: 98.891 KiB)

loop_univariate1k
New unlinked: 10.625 μs (6 allocs: 16.125 KiB)
New linked:   12.250 μs (6 allocs: 16.125 KiB)
Old unlinked: 64.542 μs (2009 allocs: 86.688 KiB)
Old linked:   58.625 μs (2009 allocs: 86.688 KiB)
SVI NT:       missing
SVI OD:       7.444 μs (6 allocs: 16.125 KiB)

multivariate1k
New unlinked: 11.125 μs (24 allocs: 80.500 KiB)
New linked:   11.250 μs (24 allocs: 80.500 KiB)
Old unlinked: 11.209 μs (29 allocs: 88.625 KiB)
Old linked:   11.208 μs (29 allocs: 88.625 KiB)
SVI NT:       10.708 μs (24 allocs: 80.500 KiB)
SVI OD:       10.833 μs (24 allocs: 80.500 KiB)

loop_univariate10k
New unlinked: 104.750 μs (6 allocs: 192.125 KiB)
New linked:   142.583 μs (6 allocs: 192.125 KiB)
Old unlinked: 752.542 μs (20009 allocs: 913.188 KiB)
Old linked:   614.750 μs (20009 allocs: 913.188 KiB)
SVI NT:       missing
SVI OD:       155.625 μs (6 allocs: 192.125 KiB)

multivariate10k
New unlinked: 107.500 μs (24 allocs: 896.500 KiB)
New linked:   106.459 μs (24 allocs: 896.500 KiB)
Old unlinked: 112.833 μs (29 allocs: 992.625 KiB)
Old linked:   110.500 μs (29 allocs: 992.625 KiB)
SVI NT:       106.000 μs (24 allocs: 896.500 KiB)
SVI OD:       110.292 μs (24 allocs: 896.500 KiB)

dynamic
New unlinked: 1.109 μs (12 allocs: 672 bytes)
New linked:   2.149 μs (43 allocs: 2.406 KiB)
Old unlinked: 1.854 μs (27 allocs: 1.891 KiB)
Old linked:   3.023 μs (53 allocs: 2.922 KiB)
SVI NT:       1.035 μs (12 allocs: 672 bytes)
SVI OD:       6.927 μs (75 allocs: 2.953 KiB)

parent
New unlinked: 2.777 ns
New linked:   10.967 ns
Old unlinked: 113.683 ns (6 allocs: 192 bytes)
Old linked:   106.579 ns (6 allocs: 192 bytes)
SVI NT:       missing
SVI OD:       4.948 ns

I think a fair TL;DR is that for both small models and models with IndexLenses this is many times faster than the old VarInfo, and not far off from SimpleVarInfo when SimpleVarInfo is at its fastest (NamedTuples for small models, OrderedDicts for IndexLenses). I would still like to close that gap a bit, I don't know why linking causes such a large slowdown in some cases, I suspect it's because the transform system is geared towards assuming we want to vectorise things, and I've hacked this together quickly to just get it to work.

For large models performance is essentially equal, as it should be, because this is about overheads. To fix that, I need to look into using views in some clever way, but that's for later.

I think this is a promising start towards being able to say that all of VarInfo, SimpleVarInfo, and VarNamedVector could be replaced with a direct use of VarNamedTuple (as opposed to e.g. VNT wrapping VarNamedVector), and it would be pretty close to being a best-of-all-worlds solution, in that it's almost as fast as SVI and has full support for all models.

Note that the new VNTVarInfo has no notion of typed and untyped VarInfos. They are all as typed as they can be, which should also help simplify code.

I'll keep working on this tomorrow.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 18, 2025

Benchmark Report

  • this PR's head: 39df57b586e9b58d7b954bb78d758335ece312d3
  • base branch: a848290ec616aad5caed88c0a6add5a2aee62ad0

Computer Information

Julia Version 1.11.8
Commit cf1da5e20e3 (2025-11-06 17:49 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬────────┬───────────────────────────────┬────────────────────────────┬─────────────────────────────────┐
│                       │       │             │        │       t(eval) / t(ref)        │     t(grad) / t(eval)      │        t(grad) / t(ref)         │
│                       │       │             │        │ ─────────┬──────────┬──────── │ ───────┬─────────┬──────── │ ──────────┬───────────┬──────── │
│                 Model │   Dim │  AD Backend │ Linked │     base │  this PR │ speedup │   base │ this PR │ speedup │      base │   this PR │ speedup │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│               Dynamic │    10 │    mooncake │   true │   367.64 │   350.82 │    1.05 │  10.34 │   11.07 │    0.93 │   3801.86 │   3882.46 │    0.98 │
│                   LDA │    12 │ reversediff │   true │  2686.68 │  2667.55 │    1.01 │   5.04 │    6.66 │    0.76 │  13544.06 │  17762.32 │    0.76 │
│   Loop univariate 10k │ 10000 │    mooncake │   true │ 58615.00 │ 53481.66 │    1.10 │   5.79 │    5.96 │    0.97 │ 339124.34 │ 318731.74 │    1.06 │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│    Loop univariate 1k │  1000 │    mooncake │   true │  5888.02 │  5339.46 │    1.10 │   5.73 │    5.97 │    0.96 │  33726.24 │  31901.82 │    1.06 │
│      Multivariate 10k │ 10000 │    mooncake │   true │ 32220.60 │ 31029.52 │    1.04 │  10.27 │    9.89 │    1.04 │ 330958.30 │ 306921.54 │    1.08 │
│       Multivariate 1k │  1000 │    mooncake │   true │  3599.15 │  3601.79 │    1.00 │   9.34 │    8.70 │    1.07 │  33626.47 │  31325.62 │    1.07 │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│ Simple assume observe │     1 │ forwarddiff │  false │     2.66 │     2.41 │    1.10 │   3.82 │    3.93 │    0.97 │     10.15 │      9.47 │    1.07 │
│           Smorgasbord │   201 │ forwarddiff │  false │  1089.45 │  1006.01 │    1.08 │ 135.14 │   68.91 │    1.96 │ 147224.74 │  69325.36 │    2.12 │
│           Smorgasbord │   201 │      enzyme │   true │  1519.11 │  1373.42 │    1.11 │   6.66 │    6.20 │    1.07 │  10114.57 │   8517.55 │    1.19 │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ forwarddiff │   true │  1501.77 │  1378.27 │    1.09 │  61.76 │   67.97 │    0.91 │  92754.63 │  93686.74 │    0.99 │
│           Smorgasbord │   201 │    mooncake │   true │  1528.65 │  1380.21 │    1.11 │   5.68 │    5.94 │    0.96 │   8679.36 │   8193.78 │    1.06 │
│           Smorgasbord │   201 │ reversediff │   true │  1529.23 │  1380.75 │    1.11 │ 100.66 │  103.82 │    0.97 │ 153929.50 │ 143348.40 │    1.07 │
├───────────────────────┼───────┼─────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│              Submodel │     1 │    mooncake │   true │     3.35 │     3.08 │    1.09 │  11.13 │   10.84 │    1.03 │     37.23 │     33.40 │    1.11 │
└───────────────────────┴───────┴─────────────┴────────┴──────────┴──────────┴─────────┴────────┴─────────┴─────────┴───────────┴───────────┴─────────┘

@penelopeysm
Copy link
Member

penelopeysm commented Dec 19, 2025

Darn, that is really good.

tuple (or actually a tiny struct, but anyway) of three values: Stored value for this variable, whether it's linked, and what transform should be applied to convert the stored value back to "model space"

Am I right in saying the latter two are only really needed for DefaultContext?

Edit: Actually, that's a silly question, if not for DefaultContext we don't even need the metadata field at all.

@penelopeysm
Copy link
Member

Also, I'm just eyeing this PR and thinking that it's a prime opportunity to clean up the varinfo interface, especially with the functions that return internal values when they probably shouldn't.

@mhauru
Copy link
Member Author

mhauru commented Dec 19, 2025

Also, I'm just eyeing this PR and thinking that it's a prime opportunity to clean up the varinfo interface, especially with the functions that return internal values when they probably shouldn't.

Yes. I'm first trying to make this work without making huge interface changes, just to make sure this can do everything that is needed to do, but I think interface changes should follow close behind, maybe in the same PR or the same release. They'll be much easier to make once there is only two VarInfo types that need to respect them, namely the new one and Threadsafe.

@mhauru mhauru changed the title VarNamedTuple as VarInfo VNT Part 5: VarNamedTuple as VarInfo Dec 19, 2025
@yebai
Copy link
Member

yebai commented Dec 19, 2025

Looks exciting! Two quick quesitons: would this be suitable to

  1. Implement simulation based inference algorithms, eg, particle MCMC, where model dimentionality or parameters support could change
  2. First model run to bootstrap / infer a VarInfo?

@mhauru
Copy link
Member Author

mhauru commented Dec 19, 2025

  1. Yep. The only thing I foresee being a problem is if some variable turns from e.g. being a Vector to being a Matrix, and you do IndexLens indexing into it. So first you have x[1] and then x[1,1]. That would be a problem. Other than that, should be fine.
  2. Yes. You can use the same type, VNTVarInfo, for both the first run when collecting variables, and for later runs when evaluating with known variables. No need for the typed/untyped distinction.

One thing I haven't benchmarked, and maybe should, is type unstable models. There is a possibility that type unstable models will be slower with the new approach, because VNTVarInfo is pretty aggressive in trying to make element types concrete, and if it keeps trying and failing again and again, that could cost a lot of time. Or it might be a negligible contribution to the performance-disaster that is a type unstable model. Need to benchmark.

Base automatically changed from mhauru/vnt-for-vaimacc to mhauru/vnt-for-fastldf January 7, 2026 16:52
@yebai
Copy link
Member

yebai commented Jan 8, 2026

One comment here is to carefully consider the requirements of particle Gibbs during reviewing so we have sufficient design prevision for

  1. bootstrap an of-type by running the model once.
  2. upstreaming AdvancedPS to Turing using VarNamedTupe and Accumulators

cc @sunxd3 @penelopeysm

@penelopeysm
Copy link
Member

I think PG should be fine. I think we aren't really removing things so much as shuffling things around and putting them in the right boxes -- so previously where Libtask had to carry a full varinfo, we would now just make it carry a VNT + accumulator tuple.

@mhauru
Copy link
Member Author

mhauru commented Jan 15, 2026

New benchmarks after some changes:

Details
┌───────────────────────────────────┬───────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┐
│                                   │ VarInfo - dev │       VarInfo - release │       untyped - release │     Simple NT - release │     Simple OD - release │
├───────────────────────────────────┴───────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┤
│ Simple assume observe                                                                                                                                     │
├───────────────────────────────────┬───────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┤
│          Compilation: Constructor │    5.083 1e-6 │       0.311 (61271.53x) │  51.500 1e-6   (10.13x) │  50.459 1e-6    (9.93x) │       0.058 (11464.81x) │
│          evaluate!! InitFromPrior │    9.430 1e-9 │ 258.838 1e-9   (27.45x) │  58.899 1e-9    (6.25x) │   4.686 1e-9    (0.50x) │ 174.513 1e-9   (18.51x) │
│         evaluate!! DefaultContext │    3.105 1e-9 │  37.174 1e-9   (11.97x) │ 679.175 1e-9  (218.72x) │   2.161 1e-9    (0.70x) │   4.320 1e-9    (1.39x) │
│                       unflatten!! │    2.472 1e-9 │  14.521 1e-9    (5.88x) │   2.162 1e-9    (0.87x) │   2.158 1e-9    (0.87x) │  53.172 1e-9   (21.51x) │
│                            link!! │  417.000 1e-9 │   1.708 1e-6    (4.10x) │   7.541 1e-6   (18.08x) │   0.000 1e-9    (0.00x) │ 125.000 1e-9    (0.30x) │
│  evaluate!! InitFromPrior, linked │    9.470 1e-9 │ 512.109 1e-9   (54.08x) │  60.164 1e-9    (6.35x) │   4.703 1e-9    (0.50x) │ 171.774 1e-9   (18.14x) │
│ evaluate!! DefaultContext, linked │    3.124 1e-9 │  38.790 1e-9   (12.42x) │ 847.212 1e-9  (271.18x) │   2.161 1e-9    (0.69x) │   4.329 1e-9    (1.39x) │
│               unflatten!!, linked │    2.470 1e-9 │  14.593 1e-9    (5.91x) │   2.162 1e-9    (0.88x) │   2.162 1e-9    (0.88x) │  53.695 1e-9   (21.74x) │
│                              keys │   17.232 1e-9 │  10.873 1e-9    (0.63x) │   2.160 1e-9    (0.13x) │   1.231 1e-9    (0.07x) │   2.162 1e-9    (0.13x) │
│                            subset │  382.411 1e-9 │ 552.714 1e-9    (1.45x) │   2.240 1e-6    (5.86x) │ 805.943 1e-9    (2.11x) │ 146.739 1e-9    (0.38x) │
│                             merge │    2.160 1e-9 │ 353.041 1e-9  (163.48x) │ 886.719 1e-9  (410.61x) │                 nothing │                 nothing │
├───────────────────────────────────┴───────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┤
│ Smorgasbord                                                                                                                                               │
├───────────────────────────────────┬───────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┤
│          Compilation: Constructor │  135.292 1e-6 │        0.255 (1885.40x) │        0.023  (167.07x) │        0.024  (175.99x) │        0.159 (1177.92x) │
│          evaluate!! InitFromPrior │    1.938 1e-6 │  41.667 1e-6   (21.51x) │   7.750 1e-6    (4.00x) │   1.355 1e-6    (0.70x) │  16.292 1e-6    (8.41x) │
│         evaluate!! DefaultContext │  437.500 1e-9 │  10.083 1e-6   (23.05x) │  73.375 1e-6  (167.71x) │ 333.333 1e-9    (0.76x) │  77.084 1e-6  (176.19x) │
│                       unflatten!! │    5.229 1e-6 │ 154.338 1e-9    (0.03x) │   2.162 1e-9    (0.00x) │   3.396 1e-9    (0.00x) │   4.817 1e-6    (0.92x) │
│                            link!! │   39.125 1e-6 │   8.333 1e-6    (0.21x) │ 106.125 1e-6    (2.71x) │ 416.000 1e-9    (0.01x) │   1.125 1e-6    (0.03x) │
│  evaluate!! InitFromPrior, linked │    3.845 1e-6 │  83.708 1e-6   (21.77x) │   6.917 1e-6    (1.80x) │   2.364 1e-6    (0.61x) │  23.375 1e-6    (6.08x) │
│ evaluate!! DefaultContext, linked │    1.194 1e-6 │   6.820 1e-6    (5.71x) │ 101.208 1e-6   (84.73x) │   1.125 1e-6    (0.94x) │  81.916 1e-6   (68.58x) │
│               unflatten!!, linked │    5.312 1e-6 │ 158.453 1e-9    (0.03x) │   2.159 1e-9    (0.00x) │   3.396 1e-9    (0.00x) │   4.833 1e-6    (0.91x) │
│                              keys │    5.292 1e-6 │ 346.875 1e-9    (0.07x) │   2.158 1e-9    (0.00x) │   1.232 1e-9    (0.00x) │   2.158 1e-9    (0.00x) │
│                            subset │  191.625 1e-6 │ 482.916 1e-6    (2.52x) │ 223.292 1e-6    (1.17x) │   4.548 1e-6    (0.02x) │   8.292 1e-6    (0.04x) │
│                             merge │   12.084 1e-6 │  12.875 1e-6    (1.07x) │        0.004  (328.71x) │                 nothing │                 nothing │
├───────────────────────────────────┴───────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┤
│ Loop univariate 1k                                                                                                                                        │
├───────────────────────────────────┬───────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┤
│          Compilation: Constructor │         0.004 │        0.477  (113.90x) │        0.416   (99.30x) │        0.408   (97.49x) │        0.470  (112.18x) │
│          evaluate!! InitFromPrior │   15.791 1e-6 │ 338.667 1e-6   (21.45x) │  68.125 1e-6    (4.31x) │   5.386 1e-6    (0.34x) │ 156.917 1e-6    (9.94x) │
│         evaluate!! DefaultContext │    2.903 1e-6 │  50.875 1e-6   (17.53x) │ 658.542 1e-6  (226.87x) │   1.188 1e-6    (0.41x) │   8.042 1e-6    (2.77x) │
│                       unflatten!! │   69.541 1e-6 │ 437.500 1e-9    (0.01x) │   2.160 1e-9    (0.00x) │   2.467 1e-9    (0.00x) │ 235.229 1e-9    (0.00x) │
│                            link!! │  319.208 1e-6 │   7.166 1e-6    (0.02x) │ 994.917 1e-6    (3.12x) │ 416.000 1e-9    (0.00x) │ 833.000 1e-9    (0.00x) │
│  evaluate!! InitFromPrior, linked │   15.666 1e-6 │ 750.833 1e-6   (47.93x) │  48.541 1e-6    (3.10x) │   5.375 1e-6    (0.34x) │ 161.833 1e-6   (10.33x) │
│ evaluate!! DefaultContext, linked │   13.875 1e-6 │  44.250 1e-6    (3.19x) │ 769.709 1e-6   (55.47x) │   1.196 1e-6    (0.09x) │   8.042 1e-6    (0.58x) │
│               unflatten!!, linked │   67.417 1e-6 │ 439.250 1e-9    (0.01x) │   2.161 1e-9    (0.00x) │   2.468 1e-9    (0.00x) │ 234.426 1e-9    (0.00x) │
│                              keys │   52.917 1e-6 │ 203.412 1e-9    (0.00x) │   2.159 1e-9    (0.00x) │   1.232 1e-9    (0.00x) │   2.160 1e-9    (0.00x) │
│                            subset │         0.018 │ 223.125 1e-6    (0.01x) │        0.019    (1.02x) │ 912.645 1e-9    (0.00x) │ 220.836 1e-9    (0.00x) │
│                             merge │  100.333 1e-6 │ 282.000 1e-6    (2.81x) │        0.388 (3870.36x) │                 nothing │                 nothing │
├───────────────────────────────────┴───────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┤
│ Multivariate 1k                                                                                                                                           │
├───────────────────────────────────┬───────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┤
│          Compilation: Constructor │   40.583 1e-6 │        0.079 (1954.34x) │  69.417 1e-6    (1.71x) │  71.875 1e-6    (1.77x) │  85.667 1e-6    (2.11x) │
│          evaluate!! InitFromPrior │    7.625 1e-6 │   8.125 1e-6    (1.07x) │  10.542 1e-6    (1.38x) │   7.667 1e-6    (1.01x) │   7.750 1e-6    (1.02x) │
│         evaluate!! DefaultContext │    2.334 1e-6 │   2.208 1e-6    (0.95x) │  39.125 1e-6   (16.77x) │   2.042 1e-6    (0.87x) │   1.958 1e-6    (0.84x) │
│                       unflatten!! │    2.468 1e-9 │ 201.640 1e-9   (81.71x) │   2.157 1e-9    (0.87x) │   2.468 1e-9    (1.00x) │ 238.369 1e-9   (96.60x) │
│                            link!! │  709.000 1e-9 │   3.375 1e-6    (4.76x) │  64.083 1e-6   (90.39x) │ 459.000 1e-9    (0.65x) │ 625.000 1e-9    (0.88x) │
│  evaluate!! InitFromPrior, linked │    7.583 1e-6 │   8.625 1e-6    (1.14x) │  10.750 1e-6    (1.42x) │   8.584 1e-6    (1.13x) │   8.125 1e-6    (1.07x) │
│ evaluate!! DefaultContext, linked │    2.083 1e-6 │   2.333 1e-6    (1.12x) │  39.458 1e-6   (18.94x) │   2.250 1e-6    (1.08x) │   2.041 1e-6    (0.98x) │
│               unflatten!!, linked │    2.471 1e-9 │ 196.970 1e-9   (79.72x) │   2.165 1e-9    (0.88x) │   2.468 1e-9    (1.00x) │ 238.765 1e-9   (96.63x) │
│                              keys │   34.064 1e-9 │  29.204 1e-9    (0.86x) │   2.161 1e-9    (0.06x) │   1.236 1e-9    (0.04x) │   2.162 1e-9    (0.06x) │
│                            subset │  413.141 1e-9 │   1.128 1e-6    (2.73x) │   3.625 1e-6    (8.77x) │ 915.323 1e-9    (2.22x) │ 214.130 1e-9    (0.52x) │
│                             merge │    2.164 1e-9 │ 804.100 1e-9  (371.63x) │   2.007 1e-6  (927.56x) │                 nothing │                 nothing │
├───────────────────────────────────┴───────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┤
│ Dynamic                                                                                                                                                   │
├───────────────────────────────────┬───────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┤
│          Compilation: Constructor │         0.002 │        0.241  (127.21x) │        0.008    (4.27x) │        0.008    (4.21x) │        0.112   (58.99x) │
│          evaluate!! InitFromPrior │    1.920 1e-6 │   3.549 1e-6    (1.85x) │   2.035 1e-6    (1.06x) │   1.681 1e-6    (0.88x) │   2.307 1e-6    (1.20x) │
│         evaluate!! DefaultContext │  598.837 1e-9 │ 928.172 1e-9    (1.55x) │   2.156 1e-6    (3.60x) │ 476.143 1e-9    (0.80x) │   2.163 1e-6    (3.61x) │
│                       unflatten!! │    4.165 1e-9 │ 107.417 1e-9   (25.79x) │   2.164 1e-9    (0.52x) │                 nothing │                 nothing │
│                            link!! │    2.083 1e-6 │   9.875 1e-6    (4.74x) │  11.708 1e-6    (5.62x) │ 666.000 1e-9    (0.32x) │   1.333 1e-6    (0.64x) │
│  evaluate!! InitFromPrior, linked │    4.063 1e-6 │   7.292 1e-6    (1.79x) │   3.965 1e-6    (0.98x) │   4.000 1e-6    (0.98x) │   4.535 1e-6    (1.12x) │
│ evaluate!! DefaultContext, linked │    1.675 1e-6 │   2.051 1e-6    (1.22x) │                 nothing │   1.556 1e-6    (0.93x) │   3.370 1e-6    (2.01x) │
│               unflatten!!, linked │    4.292 1e-9 │ 103.982 1e-9   (24.23x) │   2.164 1e-9    (0.50x) │   3.396 1e-9    (0.79x) │   5.075 1e-6 (1182.44x) │
│                              keys │   37.307 1e-9 │  45.906 1e-9    (1.23x) │   2.159 1e-9    (0.06x) │   1.233 1e-9    (0.03x) │   2.160 1e-9    (0.06x) │
│                            subset │    2.742 1e-6 │  11.354 1e-6    (4.14x) │   7.208 1e-6    (2.63x) │   4.743 1e-6    (1.73x) │   7.555 1e-6    (2.76x) │
│                             merge │    7.667 1e-9 │   1.358 1e-6  (177.17x) │   4.030 1e-6  (525.61x) │                 nothing │                 nothing │
├───────────────────────────────────┴───────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┤
│ Parent                                                                                                                                                    │
├───────────────────────────────────┬───────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┤
│          Compilation: Constructor │    5.542 1e-6 │       0.059 (10711.15x) │  34.250 1e-6    (6.18x) │  35.750 1e-6    (6.45x) │  55.250 1e-6    (9.97x) │
│          evaluate!! InitFromPrior │    9.624 1e-9 │ 375.000 1e-9   (38.96x) │ 119.229 1e-9   (12.39x) │                 nothing │                 nothing │
│         evaluate!! DefaultContext │    3.112 1e-9 │  85.376 1e-9   (27.43x) │ 809.514 1e-9  (260.13x) │                 nothing │                 nothing │
│                       unflatten!! │    2.472 1e-9 │  37.422 1e-9   (15.14x) │   2.164 1e-9    (0.88x) │   2.157 1e-9    (0.87x) │  53.599 1e-9   (21.69x) │
│                            link!! │    1.041 1e-6 │   1.708 1e-6    (1.64x) │   7.958 1e-6    (7.64x) │                 nothing │                 nothing │
│  evaluate!! InitFromPrior, linked │    9.580 1e-9 │ 775.000 1e-9   (80.90x) │ 123.548 1e-9   (12.90x) │                 nothing │                 nothing │
│ evaluate!! DefaultContext, linked │    3.111 1e-9 │  86.527 1e-9   (27.82x) │ 959.759 1e-9  (308.53x) │                 nothing │                 nothing │
│               unflatten!!, linked │    2.470 1e-9 │  37.461 1e-9   (15.17x) │   2.161 1e-9    (0.88x) │                 nothing │                 nothing │
│                              keys │   33.864 1e-9 │  29.848 1e-9    (0.88x) │   2.157 1e-9    (0.06x) │                 nothing │                 nothing │
│                            subset │  763.158 1e-9 │ 675.231 1e-9    (0.88x) │   2.817 1e-6    (3.69x) │                 nothing │                 nothing │
│                             merge │    2.159 1e-9 │ 402.371 1e-9  (186.34x) │   1.315 1e-6  (609.20x) │                 nothing │                 nothing │
├───────────────────────────────────┴───────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┤
│ LDA                                                                                                                                                       │
├───────────────────────────────────┬───────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┬─────────────────────────┤
│          Compilation: Constructor │         0.138 │        0.439    (3.18x) │        0.144    (1.04x) │        0.141    (1.02x) │        0.284    (2.06x) │
│          evaluate!! InitFromPrior │    7.167 1e-6 │  11.480 1e-6    (1.60x) │   7.639 1e-6    (1.07x) │   7.250 1e-6    (1.01x) │  15.541 1e-6    (2.17x) │
│         evaluate!! DefaultContext │    6.115 1e-6 │   7.611 1e-6    (1.24x) │  10.104 1e-6    (1.65x) │   5.517 1e-6    (0.90x) │  14.104 1e-6    (2.31x) │
│                       unflatten!! │    3.111 1e-6 │ 111.918 1e-9    (0.04x) │   2.159 1e-9    (0.00x) │                 nothing │                 nothing │
│                            link!! │    7.708 1e-6 │  12.458 1e-6    (1.62x) │  22.708 1e-6    (2.95x) │   2.833 1e-6    (0.37x) │   3.291 1e-6    (0.43x) │
│  evaluate!! InitFromPrior, linked │    7.417 1e-6 │  15.875 1e-6    (2.14x) │   7.903 1e-6    (1.07x) │   7.792 1e-6    (1.05x) │  16.125 1e-6    (2.17x) │
│ evaluate!! DefaultContext, linked │    6.469 1e-6 │   7.889 1e-6    (1.22x) │  15.333 1e-6    (2.37x) │  10.042 1e-6    (1.55x) │  18.041 1e-6    (2.79x) │
│               unflatten!!, linked │    3.111 1e-6 │ 108.133 1e-9    (0.03x) │   2.158 1e-9    (0.00x) │                 nothing │                 nothing │
│                              keys │  656.977 1e-9 │  97.421 1e-9    (0.15x) │   2.159 1e-9    (0.00x) │   1.233 1e-9    (0.00x) │   2.162 1e-9    (0.00x) │
│                            subset │    2.898 1e-6 │  55.416 1e-6   (19.12x) │  10.729 1e-6    (3.70x) │   4.646 1e-6    (1.60x) │   8.986 1e-6    (3.10x) │
│                             merge │    3.427 1e-6 │   1.986 1e-6    (0.58x) │  46.166 1e-6   (13.47x) │                 nothing │                 nothing │
└───────────────────────────────────┴───────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┴─────────────────────────┘

We are now comfortably matching or beating the old typed VarInfo in evaluation benchmarks. We are slower, but in the same ballpark, as SimpleVarInfo{NamedTuple} for simple models, which is pretty much the best we can hope for given how many more features we offer. The only things where we are slower than the old typed VarInfo are unflatten!! and link!! when IndexLenses are involved, I'm trying to understand why.

For some reason our compilation time (more precisely, the first execution of VarInfo(m)) seems to also have gone down massively. I don't know why, but I'll take it. (Note that the "compilation" times for SimpleVarInfo in the results are not a fair comparison, please pay no attention to them.)

@mhauru mhauru requested a review from penelopeysm January 15, 2026 14:07
Co-authored-by: Penelope Yong <[email protected]>
end

function unflatten!!(vi::VarInfo, vec::AbstractVector)
function unflatten!!(vi::VarInfo{Linked}, vec::AbstractVector) where {Linked}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re. comment below this: Did you look into the closure boxing thing, or should I just leave it for another time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing it out, but so far it hasn't yielded much. I get the feeling the time cost is somewhere else, but I don't yet understand where.

Comment on lines 496 to 500
new_linked = if LinkedLeft == LinkedRight
LinkedLeft
else
nothing
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pathological, but what if one of them is empty but has the opposite link status? Then we should just take the link status from the non-empty one. Alternatively, only determine the new link status by iterating through the values again after merging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair points. I've added a comment, because I don't think this is high priority to fix right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair, it's just a performance hit rather than correctness.

@mhauru
Copy link
Member Author

mhauru commented Jan 15, 2026

BTW: VNT seems to use @inbounds liberally. I just found out about 30 seconds ago that apparently this is not good: JuliaStats/Distributions.jl#2005

This is concerning. I should try removing them. I would be surprised if Julia was able to optimise these things itself, because a lot of the @inbounds marks rely on mask and data being the same shape, which depends on them having been constructed that way. However, if it hurts type inference, it could be really bad.

@mhauru mhauru requested a review from penelopeysm January 15, 2026 15:51
@mhauru mhauru mentioned this pull request Jan 15, 2026
14 tasks
@mhauru
Copy link
Member Author

mhauru commented Jan 15, 2026

I think this is good enough to merge. I've listed the residual issues to check and polish here: #1201. @penelopeysm, are you happy?

@sunxd3
Copy link
Member

sunxd3 commented Jan 16, 2026

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants